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D The prevalence of structure in biological populations challenges fundamental assumptions at 

Mh the heart of continuum models of population dynamics based on mean densities (local or global) 

— only. Individual-based models (IBM's) were introduced over the last decade in an attempt to 

overcome this limitation by following explicitly each individual in the population. Although the 
IBM approach has been quite insightful, the capability to follow each individual usually comes 
^T^ at the expense of analytical tractability, which limits the generality of the statements that can 

p I be made. For the specific case of spatial structure in populations of sessile (and identical) 

Q organisms, space-time point processes with local regulation seem to cover the middle ground 

. ^ between analytical tractability and a higher degree of biological realism. This approach has 

shown that simplified representations of fecundity, local dispersal and density-dependent mor- 
tality weighted by the local competitive environment are sufficient to generate spatial patterns 
that mimic field observations. Continuum approximations of these stochastic processes try to 
I distill their fundamental properties, but because they keep track of not only mean densities, 

^ but also higher order spatial correlations, they result in infinite hierarchies of moment equa- 

^SJ tions. This leads to the problem of finding a 'moment closure'; that is, an appropriate order 

0^ of (lower order) truncation, together with a method of expressing the highest order density not 

explicitly modelled in the truncated hierarchy in terms of the lower order densities. We use the 
principle of constrained maximum entropy to derive a closure relationship at second order using 
normalisation and the product densities of first and second orders as constraints, and apply it 
to one such hierarchy. The resulting 'maxent' closure is similar to the Kirkwood superposition 
I approximation, or 'power-3' closure, but it is complemented with previously unknown correction 

• • terms that depend on integrals over the region for which third order correlations are irreducible. 

. 5^ The region of irreducible triplet correlations is found as the domain that solves an integral equa- 

tion associated with the normalisation constraint. This also serves the purpose of a validation 
^ check, since a single, non-trivial domain can only be found if the assumptions of the closure are 

consistent with the predictions of the hierarchy. Comparisons between simulations of the point 
process, alternative heuristic closures, and the maxent closure show significant improvements in 
the ability of the truncated hierarchy to predict equilibrium values for mildly aggregated spa- 
tial patterns. However, the maxent closure performs comparatively poorly in segregated ones. 
Although the closure is applied in the context of point processes, the method docs not require 
fixed locations to be valid, and can in principle be applied to problems where the particles move, 
provided that their correlation functions are stationary in space and time. 
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1 Introduction 



One of the most widely used models in theoretical ecology is the logistic equation [501 EHl ES] 

|mi(t) = rmi(0(l-^) (1) 
mi(0) = no, 

which describes the dynamics of a population in terms of a single state variable mi{t), which can be 
interpreted as the total population size or as the global density. The rate of change of the density 
in the logistic model is determined by three drivers. The first two are present in the net growth 
term r = b — d, where b and d are respectively the per capita fecundity and intrinsic mortality 
rates. The third one is the density-dependent mortality rate, which is assumed to be proportional 
to the density, where the constant of proportionality K is the 'carrying capacity', i.e. the maximum 
number of individuals per unit area or volume that can be supported by some unspecified limiting 
resource. This model is built on the following set of assumptions [H \19\ Hlj: 

1. There are no facilitative interactions among conspecifics. 

2. Contributions to mortality due to competition are pairwise additive. 

3. The limiting resource is uniformly distributed in space, and shared proportionally by all 
individuals. 

4. There are no differences among individuals in age, size or phenotype. 

5. The spatial locations of the individuals are uncorrelated. 

6. Allocation to reproductive tissues is independent of the local resource availability. 

7. Density-dependent mortality occurs at the same temporal scales than fecundity and intrinsic 
mortality. 

These assumptions are valid only for a rather restricted set of biological situations. For instance, 
facilitative interactions are known to play a determinant role alongside competition in shaping 
community structure and dynamics [9]. In plant communities, non-succesional positive interac- 
tions can result from additional resources being made available through synergies (e.g. hydraulic 
lift, microbial enhancement, mycorrhizal networks), a reduction in the impact of climate extremes 
and predation [31j or a combination of these. The assumption of pairwise additivity in density- 
dependent mortality enjoys some degree of empirical support for plant populations [7H], but it is 
still an unresolved issue [171 [22]. Forms of population structure driven by size (or age), phenotype 
or spatial pre-patterning in the abiotic substrate having an impact on fecundity, recruitment and 
survivorship are ubiquitously observed both in the field and experimental literature [571 E3] [67j . 
Seed dispersal and competitive interactions are known to occur over a characteristic range of spa- 
tial scales rather than being uniformly distributed as is commonly assumed in the logistic model 
[I21[291[M1[671[681I7Q1[I0]. 

These limitations have motivated the search for alternatives to the logistic equation that can 
address questions of broader biological interest, while simultaneously maintaining a reasonable de- 
gree of mathematical and computational tractability. Achieving this goal depends heavily on the 
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development of multiscale modeling approaches capable of linking patterns manifested at the larger, 
population-level scales, to their drivers, which lie in biological processes occurring at the level of 
individuals; typically taking place over spatial and temporal scales that differ substantially from 
those at which the population-level regularities are detected [ill6l[T9 t[23lli3lHil l^H5l[6T]. 

Among all the possible paths suggested as one relaxes these assumptions ([l}|7]), understanding 
the role of spatial structure, particularly that driven by biological processes alone, has received 
a considerable amount of interest [20l HH HI El [TJ |62l |T2l |8l [36] . The approaches that have been 
developed for the spatial problem have a number of commonalities. They usually consist of an 
individual-based model (IBM) \18\ [30] which follows simplified representations of the life histories 
of each individual in the population. These representations include the biological processes believed 
to play a role in driving the population-level phenomena, and typically include a combination of 
fecundity, dispersal, mortality and in some cases, growth. These are modeled in such a way that 
some form of density-dependent regulation is present in at least in at least one of them. Second, the 
density-dependent regulation is determined by the neighborhood configuration surrounding each 
focal individual, which leads to a local regulation of the process JSl [Ml [26] . Third, the dynamics of 
the macroscopic patterns is obtained from an average of a sufficiently large number of independent 
realisations of the individual-level model. Insights about the emergence of various forms of popu- 
lation structure, in particular space, are gained as these broad scale patterns are allowed to vary 
with the characteristic scales that regulate the biological processes at the level of the individual 
organism [3 iSl [Ml [SS [77] . 

This approach, albeit insightful, restricts severely the statements that can be made about how 
the processes present across various scales interact to produce pattern, since typically there is an 
absence of a model condensing the dynamics of pattern at the larger scale. To circumvent this 
deficiency, several attempts to derive population-level models from the IBM have been introduced 
in the literature. In the context of spatial pattern in plant population dynamics [H Il3l [36l 162] . 
these models typically take the form of hierarchies of equations for relevant families of summary 
statistics where quantities in addition to the mean density capture spatial correlations among pairs, 
triplets etc, that quantify spatial pattern across a range of scales [121 [72]. These summary statis- 
tics are closely related to the central, factorial or raw spatial moments of the underlying spatial 
stochastic process. For pair configurations in plant population models, common choices are the 
spatial auto-covariance or the second order product density [l2l|lll|2Tl [72] . A discussion of these 
various approaches in the development of continuum approximations to spatio-temporal stochastic 
processes in ecology can be found in a compilation edited by Dieckmann et al [20] . 

The non-linearities due to the presence of density-dependence in spatially explicit IBM's in- 
evitably result in infinite hierarchies of evolution equations for the summary statistics, where the 
dynamics of the correlations of order k is tied to that of order k + 1. If one truncates the hierarchy 
at some order, the evolution equation at the order of the truncation will depend on the unknown 
density of the next higher order. Analysis of these hierarchies can only proceed after truncation 
for some small order. This requires the solution of two problems. The first, is identifying an ap- 
propriate order of truncation k. The second is compensating for the resulting loss of information. 
The order of truncation in existing models is chosen on the basis of computational complexity, and 
rarely goes beyond two [69l [H |l3] . For the second problem, the density of order A; -|- 1 is replaced 
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by a functional relationship of all the densities of order up to k, usually called a 'moment closure'. 
This functional dependence of higher order quantities on lower order ones is constructed mainly 
on heuristic reasoning [H |T9l |5T]. For instance, when the order of truncation is two, assuming 
vanishing central moments of order three leads to the so-called 'power-1' closure [Ij. The 'power-2' 
closure arises from an analogy with the pair approximation used in discrete spatial models \36\ [T9] . 
Assuming independence of the three pair correlations associated with each edge of a triplet for all 
spatial scales leads to the 'power-3' or Kirkwood superposition approximation |4H [T9]. Although 
higher order closures do exist , they have restricted applicability due to the daunting computational 
problem that results at orders higher than three [69]. 

Despite some encouraging success that resulted in analytical solutions of the hierarchy at equi- 
librium for truncation at second order [HEllT], and remarkably good fit of the numerical solution of 
the hierarchy with individual-based simulations with so-called asymmetric versions of previously 
used closures [lU [51], most predict poorly the equilibrium densities even for situations of mild 
spatial correlations. In the cases where they succeed over a broader range of regimes of spatial 
correlations (i.e. the asymmetric power-2), the closure depends on tuning a set of weighting con- 
stants whose values can presently be found only by comparison with simulations of the stochastic 
process. A significant obstacle in the widespread adoption of these continuum approximations and 
their closures is that none of them is equipped with a criterion for their domain of validity that 
does not depend on comparisons with simulations of the individual-based model. Nevertheless, 
many of these heuristic closures do provide a better approximation to the dynamics of a spatially 
structured population than the logistic equation, and illuminate a variety of mechanisms by which 
endogenously generated spatial pattern appears in plant populations. 

Inspired by earlier results of Hillen [35] and Singer |69j , who used the principle of constrained 
maximum entropy \66\ HQ] [38j to respectively derive closures for velocity jump processes |52j 
and the BBGKY hierarchy arising in the statistical mechanics of fluids ^Tj, we develop a closure 
scheme based on constrained entropy maximisation for the moment hierarchy developed by Law 
& Dieckmann [33], constrained to satisfy normalisation and the product densities up to order 
two. In order to be able to relate the output of the entropy maximisation to the approximating 
dynamical system, we also reframe the hierarchy of Law & Dieckmann [33] in terms of product 
densities rather than the spatial moments. These two kinds of sets of summary statistics are very 
closely related, since the latter can be seen as estimators of the former. The approach of Hillen 
[35j consists of proving that the L^-norm over the space of velocities of the transport equation of 
Othmer et al [52] behaves like an entropy, with the velocity moments acting as constraints. Singer 
[69j treats the triplet product density as a probability density in order to construct an entropy 
from the point of view of information theory HO] I38j . using consistency of the marginals as 
constraints. Our approach diff'ers from these two other maximum entropy maximisation methods in 
a number of ways. First, we use the information theoretical entropy functional for point processes 

I16j . based on the negative of the expected log-likelihood, and includes all the orders that 
contribute spatial information, not just order three. Second, the product densities which provide 
the constraints are incorporated into the entropy functional by means of an expansion that allows to 
express the likelihoods (or Janossy densities) in terms of product densities and vice versa |13| [T3] , 
this allows us to establish a formal connection between the entropy functional and the moment 
hierarchy. Third, our closure is implicit, in the sense that the density of order three appears at 
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both sides of the closing relationship, thus allowing irreducible correlations of third order to be 
explicitly included. Fourth, the method presented here complements the Kirkwood (or power-3) 
closure with previously unknown correction terms that depend on the area for which the three 
points in the triplet become independent. These correction terms are important where the three 
particles in the triplet configuration are close to each other, but progressively vanish as these 
become separated, at which point the maximum entropy closure reduces to the classical Kirkwood 
superposition approximation. These correction terms lead to substantial improvements in the 
prediction of the equilibrium density for mildly aggregated patterns. In addition, the closure comes 
equipped with a criterion of validity stemming from the normalisation constraint. This validity 
check comes from an ancillary integral equation that returns the area of the domain at which the 
points become independent. This equation produces a single, non-trivial root when the correlations 
predicted by the moment hierarchy are consistent with the truncation assumptions, but fails to do 
so otherwise. The maximum entropy closure relationship we found is given by 
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where mi, m2, m3 are the first, second and third order product densities (the densities of the factorial 
moments of the underlying spatial point process), 6, 6 vector distances respectively linking 
the pairs of particles (a^i, 2:2) and (xi, X3) conforming a triplet configuration. The set ^0 is a circular 
domain of area |^o| that establishes the spatial scale for which triplet correlations are irreducible, 
and Jo(^) is the avoidance function (i.e. the probability of observing no points in A) of the spatial 
point process for the window A \13\ [H] . This set is found as the domain of integration that solves 
the normalisation condition 
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where is a circular domain of radius e centered at the origin. The set Aq is found by allowing 
the radius e to take positive real values until the equality in ([s]) holds. This closure is applied if 
the three points in the triplet lie inside ^0, and outside this region the classical Kirkwood closure 
applies. If the area of normalisation ^0 is small, the largest correction is due to the Jq term since 
the integral correction terms in the numerator and denominator tend to cancel each other, in which 
case the maxent closure is simply given by 
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where the exponential term corresponds to the avoidance function of a Poisson point process of 
mean density mi normalised with respect to the window Aq. 
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The paper is organized as follows. Section [2] discusses the locally regulated space-time point 
process model originally developed by Law &: Dieckmann [33], and includes a Gillespie- type simu- 
lation algorithm [28i |59j , together with known definitions and estimators for the product densities 
and some simulation results included for illustration purposes only. Broader simulation results for 
this process can be found elsewhere [HI [511 158] . Section [3] reframes the spatial moment equations of 
Law et al [H] in terms of product densities. Section [4] discusses the moment closure for truncation 
at second order based on constrained entropy maximisation. Section [5] discusses the numerical im- 
plementation of the closure and compares its predictions against simulations of the point process for 
mildly aggregated patterns. Finally, section [6] presents a critique of the maximum entropy method 
method, and suggests further areas of development. 

2 Spatio-temporal point process model 

We consider a single population of identical individuals, each of which can occupy arbitrary locations 
on a 2-dimensional continuous and bounded spatial arena A. The state of the population for each 
fixed time t is modeled as a realisation of a spatial point process, called the configuration or point 
pattern [H [721 [21], 

^pt{A) = {xi,. .. ,XNt] ^ (5) 
where the Xi are the spatial locations of all individuals found within A. Alternatively we have 

Nt{A) = #{xi,...,XN,} 

where Nt{A) stands for the total population counts within A, and the cardinality operator ^ counts 
the number of elements in a set. Note that in ([5]) both the locations Xi and the total counts Nt are 
random variables. The dynamics of the population is modeled by introducing a time component, 
where the updating times are also random variables, subject to local regulation |1H I14j . Two 
versions of this model have been introduced independently by Bolker & Pacala [3] and Dieckmann 
& Law [19j. Both share the key ingredients of non-uniform dispersal, and a density-dependent 
mortality term that depends on the configuration surrounding the focal individual which is the 
mechanism that introduces the local regulation. The configuration ^ evolves in time by sampling 
from two exponential distributions of waiting times that regulate the inter-event times between 
fecundity/dispersal and mortality events at the individual level, where the latter is determined 
from both intrinsic and density-dependent contributions. 



Table 1: Point process model parameters 



Parameter 


symbol 


units 


fecundity 


h 


time~^ 


intrinsic mortality 


d 


time~^ 


density-dependent mortality 




time~^ indiv^^ 


non-spatial carrying capacity 


K 


individuals 


dispersal scale 




length 


competition scale 




length 


initial population size 


No 


individuals 


spatial arena 


A 


length^ 
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2.1 Dispersal and fecundity 



Per capita waiting times between births are assumed to be exponentially distributed with constant 
parameter b, the birth (or fecundity) rate. If a birth occurs, the newborn is displaced instantaneously 
from the location of its mother Xi to a random new location Xj , sampled from the probability density, 
B{xi — Xj^as) the dispersal kernel^ where gb is a parameter that measures the characteristic 
dispersal length. The index i of the mother is chosen uniformly from the list of indices Ja = 
{1, 2, . . . , Nt{A)} in the configuration. 



2.2 Mortality 

The probability that a given individual i at location xi dies in the time interval (t, t + dt) is also 
assumed to be exponentially distributed with parameter m{xi), the total per capita mortality rate, 
given by 

m{xi) = d + dN ^ W{\xi- Xj\;aw), (6) 

j^i 6 J A 

where d, is the intrinsic mortality rate, and d^ is the density-dependent mortality rate. In order 
to allow comparisons with the predictions of the logistic model ([l]) we defined it as d^ = {b — d)/K, 
where K is the non-spatial carrying capacity (the expected value at equilibrium under complete 
spatial randomness). This second 'mortality clock' is rescaled by a weighted average of the local 
configuration around the focal individual, so that mortality due to competition is more likely to 
occur in locally dense regions than in comparatively sparse ones. The contributions of neighbors 
to the mortality of Xi are assumed to decay monotonically with distance. This is modeled by a 
normalized, radially symmetric weighting function I1^(|i^| the mortality kernel, that vanishes 

outside a finite interaction domain Dw , where aw is a parameter associated with the characteristic 
length scale of competitive interactions. This function is interpreted as an average effect that 
simplifies the details of the physiology of mortality due to crowding. The parameters of the model 
are summarized in Table [H 



2.3 Simulation algorithm 



A sample path for the space-time point process with rates described in Sections 2.1 and 2.2 can be 
simulated by a variant of the Gillespie algorithm |28| [59] . The spatial arena can be identified with 
the unit square W = [0, 1] x [0, 1] (after rescaling the parameters in the interaction kernels), with 
periodic boundary conditions. The initial population consists of A'^o individuals, and [0, Tmax] is 
the time interval of interest. 

1. Generate the configuration at time t = 0, (po = {{xi,yi); . . . ; (a^AfQ, yAfg)}, from two indepen- 
dent sets of Nq deviates from U{0, 1), Xq = {xi, . . . , xnq} and Yq = {yi, . . . , yNo}- 



2. While the elapsed time t is less than Tmax do: 



(a) Generate a birth waiting time Tf, from the exponential density with parameter bNt, 
where Nt is the number of individuals that are alive at time t. 
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(b) Generate the set of mortality waiting times Tm = {n, . . . , tat^} from a set of exponen- 
tial densities, each with parameter m{xi) = d + Ylj^ti — Xj\), for each of the 
i = 1, . . . Nt individuals in the configuration at time t 



(c) The time until the next event is given by = min{Th U Tm}- 



i. A birth occurs if r„ = Tb, in which case the location of the newborn individual Xb is 
given by 

Xf) — Xp -\- ^ 

where the index of the parent p is drawn uniformly from the set of indices Ja and 
the displacement ^ is drawn from the dispersal kernel -B(C). The configuration is 
then updated to include the newborn 

ift+Tt y^t^ {xb}- 

ii. If Tn 7^ Tb then the next event is a death in which case the i-th individual in T^ for 
which Ti = Tn is removed from the configuration 

ft+T„ -^ft\ {Xi} 

(d) Update the elapsed time t — )■ t + r„. 



2.4 Summary statistics 

The specific configurations resulting from simulations of the algorithm in Section |2.3| are of lim- 
ited interest. The fundamental question is understanding how spatial correlations develop from an 
unstructured initial condition, and how the equilibrium density departs from the logistic behavior 
when considering an ensemble of simulations for various combinations of the spatial scales of com- 
petition and dispersal [H H3l [T9] . This requires a set of summary statistics capable of distinguishing 
various forms of spatial structure for the same population size (see Figure [T]) . A useful set for this 
task are the product densities (or densities of the factorial moments), i.e the densities of the ex- 
pected configurations involving one, two or more distinct points after removing self-configurations 
[72l [HI El]. For spatially stationary point processes, these are functions of the inter-point dis- 
tances between the points comprising an expected configuration of a certain order k. The product 
densities are defined in terms of the population count Nt{B) observed through some window B at 
time t defined as [TSJ dH E] 

Nt{B)= Isixi), (7) 
where I six) is the indicator function of the set B defined by 

"^^^"^'^ otherwise. 

The coarsest is the mean density (or intensity) which measures the expected number of individuals 
per unit area at each time, defined as 

, , E{Nt(SJx))} 
mi {x,t) = hm ^ / I ^ 9 

eiO \Se{x)\ 
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where S^ix) is the open bah of radius e centered around x, and \A\ is the area of the window A. 
Since the mortahty and fecundity rates do not depend specific locations but on relative distances, 
and both the dispersal or competition kernels are symmetric by definition, the spatial point process 
is spatially stationary and isotropic, in which case the mean density is constant for each fixed time 

mi(x , t) = mi{t). 

A naive estimator for the mean density from a single realisation is |72^ [21] 

^iW = ^;^ (10) 

where Nt{A) is as in ([7]). If an ensemble of Q independent replicates of the process is available, 
this estimate can be improved by averaging over the ensemble 

(U) 

For a Poisson process, the mean density ^ is a sufficient statistic for the process. More general 
cases require keeping track of spatial correlations. Higher order quantities are required to distinguish 
between aggregated (or clustered), random and segregated (or over-dispersed) point patterns with 
the same mean density (see Figure [T|. For this purpose we need, at the very least, information 
about two-point correlations. These are measured by the pair correlation function, defined as the 
ratio 

which requires knowledge of the density of the expected number of pairs at spatial lag ^, measured 
by the second order product density m2(^,t). 

^ . ,^ _ E {iV,(ge(0) ) [ Nt{S.{0 + ) - 5o{S.{Q + Q) ]} 

"^^^'*^-4^ |S.(0)| 15.(0 + 01 ^ ^ 

where ^.(O) and ^.(O + O are small windows of observation respectively centered at the origin, and 
at distance ^ from the origin. The Dirac measure in the second factor in the numerator removes 
the count at zero lag from the second window in order to avoid self-configurations. In general. 



the definition (13) centers the count for each specific location x, but given that in our case the 
process is stationary and isotropic by construction, it can be translated to the origin without loss 
of generality, in which case m2 depends only on the spatial lag ^. 

In the case of a spatially random configuration (a Poisson point process), the counts on non- 
overlapping windows are independent of each other and thus the second order density is simply the 
square of the mean density. Correlations of configurations involving k points are simply the A;-th 



powers of the mean density |2H I72j. The pair correlation function (12) is the lowest order product 
density that allows detection of departures from complete spatial randomness. Thus, values of 
the pair correlation function greater than one for some lag ^ indicate aggregation at that scale, 
whereas values below one signal segregation. Estimation of the pair correlation function requires 
an estimator of the squared density [72] 

-2(,._{Nt{A) [Nt{A)-l])n 
mi[i) — 
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Figure 1: The three upper panels show different types of point patterns sharing the same number 
of points N{A) = 136, where the window A is the unit square. The left panel shows aggregation, 
the center panel corresponds to complete spatial randomness and the the right panel displays a 
segregated pattern. In the aggregated pattern we see the tendency of points to occur near each 
other. By contrast in the regular pattern points tend to avoid each other at short spatial scales. The 
lower three panels show estimates of the pair correlation function g2 (r) for each of the three point 
patterns at the top. The lower left panel indicates aggregation at short scales but segregation at 
intermediate ones. In the lower center panel the pair correlation function oscillates rapidly around 
one, which signals randomness, and the lower right panel indicates a tendency to segregation at 
short scales. 
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together with a kernel density estimator for the second order product density [64:\ [7T] , 



i j^i 



where r is the spatial lag, h is the bandwidth of the kernel density estimate kh, the points Xi belong 
to a configuration (pt{A) sampled at time t, and || the Euclidean distance between the 

points Xi and Xj. The denominator is an edge corrector that rescales the count in the numerator 
by the area of the intersection of the window of observation Ax^ shifted so that its centered around 
the point Xi , with the window Axj shifted around Xj [HI [l2l [72] 

Ax^ = {x + Xi : X £ A}. 



If an ensemble of independent realisations is available, the single realisation estimator (14) can be 
improved by means of an ensemble average 



As before, the angle brackets ()q represent an average of the estimates across a number of in- 
dependent sample paths Vt. For the smoothing kernel kh a common choice is the Epanechnikov 
kernel 



where I is the indicator function ([s]) . Although empirical methods for selection of the bandwidth 
h are widely used, for instance the rule |7T] 



h = c/yjrhi{t), c G (0.1,0.2), 

data-driven methods for optimal choices of h based on cross-validation have been recently intro- 
duced [33l[3l]. In general, the product density of order k is defined as [2j 



mfc(a;i, . . . = lim E <( TT — — V, (15) 

40 1^^^^^^ \Se{Xj)\ J 

where "Yli^i^i ^xi{Se{xj)) removes self j-tuples for j > i. In the case of spatial stationarity and 
isotropy, the specific locations xi, . . . , can be replaced by the relative distances ^i, . . . , ^^-i) 

nikiii-, ■ ■ .,(,k-i,t), 

and the k-th correlation function becomes, 

5fc(6,---,Cfc-i;i) = TTTT (16) 

mf (t) 

which is interpreted in a similar way to the pair correlation function, but considering /c-plets instead 
of pairs. 
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2.5 Point process simulation results 



For the convenience of the reader, simulation results for the point process are shown in Figure [2| 
with the same parameter values as in Law et al [44j, but obtained from code developed indepen- 
dently. The spatial arena is the unit square, and the kernels are both radially symmetric Gaussians, 
but the mortality kernel is truncated (and renormalized) at 2iaw- The left panel shows estimates 
of the mean density versus time for various values of the characteristic spatial scales of dispersal 
and mortality. The right panel shows the pair correlation function at the end of the simulation for 
each of the four spatial regimes for which the population persists. Both quantities were estimated 
from an ensemble of 300 independent sample paths. 

Case (b) in both panels corresponds to dispersal and mortality kernels with large characteristic 
spatial scales {as = 0.12, ay/ = 0.12). In this situation there is enough mixing to destroy spatial 
correlations — confirmed by the almost constant pair correlation function — and the mean density 
equilibrates at a value that is very close to the non-spatial carrying capacity {K = 200). Case (a) 
shows results for a segregated (or regular) spatial pattern that arises from very local competitive 
interactions, but long range scales of dispersal {as = 0.12, aw = 0.02). In this situation local 
densities experienced by the focal individual are lower than the random case (the pair correlation 
function is below one), which results in equlibrium densities that equilibrate at higher values than 
the non- spatial carrying capacity. This results from the ability of newborns to escape locally 
crowded regions via the long range dispersal kernel. Case (c) is associated to a segregated pattern 
of clusters, which is the converse situation of the segregated pattern with very localized dispersal, 
and mild competition distributed over a longer range {as = 0.02, aw = 0.12). The oscillations of 
the pair correlation function indicate two scales of pattern. There is short scale aggregation, but 
the clusters themselves form a segregated pattern with respect to each other, so the local crowding 
due to clustering that should lead to high density-dependent mortality is compensated by the 
overdispersion. Overall, the local competitive neighborhood experienced by an individual in this 
situation is more crowded than in a random distribution of points, which results in a mean density 
that equilibrates at lower values than the non-spatial carrying capacity. Case (d) corresponds to 
a mildly aggregated pattern (as = 0.04, aw = 0.04), where there is a single scale of aggregation. 
Even for small departures from complete spatial randomness such as this one, the effect of the spatial 
pattern in the dynamics of the mean density is substantial, since we see a reduction of about 30% 
in the equilibrium density in this case with respect to that of complete spatial randomness. Finally, 
case (e) indicates an extreme case of aggregation, with very intense, local mortality and dispersal 
(cjb = 0.02, aw = 0.02), where the population goes to extinction (exponentially) after a short 
growth transient. 



3 Moment equations and the closure problem 



The central problem associated with the space-time point process described earlier in Section 2.3 
is to obtain a closed form expression for the finite dimensional distributions, 

Fk{Ai,...,Ak,ni,...,nk;t}, (17) 

that determine the probability of observing ni points in the window Ai, n2 points in the window 
A2, and so forth up to the rik points in Ak at time t, from the definition of the space-time point 
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Figure 2: The left panel shows estimates for the mean density rhi{t) from an ensemble oi Q = 300 
realisations, for various characteristic spatial scales of dispersal and density-dependent mortality. 
The dotted lines are the envelopes for one standard deviation. The right panel shows the corre- 
sponding estimates for the pair correlation function g2ir) at the end of the simulation. The other 
parameters, b = 0.4, d = 0.2, K = 200, are fixed for all cases. The spatial arena is the unit square 
with periodic boundaries. 
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process discussed in the previous section. Unfortunately, this seems to be remarkably difficult, due 
to the presence of the non-linearity in the mortality rate in ([g]), and the localized nature of dispersal 
[24j . However, the question of ecological interest is understanding the modifications that should be 
introduced to the logistic equation ([T]) in order to account for the effects of spatial correlations in 
the dynamics of the mean density. This can be accomplished by deriving evolution equations for the 



product densities (which are the densities of the factorial moments of ( 17)) from the transition rates 
of the point process discussed in the previous Section. Following a Master equation approach similar 
to that used by Bolker & Pacala [3j and Dieckmann &: Law |19j . we derive the following hierarchy 
of product density equations (see Appendix A). The first member in this hierarchy corresponds to 
the modified or 'spatial' logistic equation |49j . 



m,{t)=rmi{t)-dN I 1^(6) ^2(^1, t) d^l. (18) 



dt 

where r = b — d, d^ = {b — d)/Ks and VF(^i) is the mortality kernel in ([6]). Ks is the spatial 
carrying capacity, or the number of individuals per unit area that can be supported under random 
mixing 



Equation ( 18 ) shows that the required modification of the logistic equation consists of substituting 
the quadratic term with an average of the second order product density r?T,2(^i,t) weighted by the 
mortality kernel VF(Ci). This term computes the effective number of neighbors n^Q that contribute 
to density-dependent mortality, 

"effW= / W{^i)m2{ei,t)dei. 
Thus, the effect of mortality on the evolution of the mean density is tied to a weighted average of the 



mortality kernel with the two-point spatial correlations in the process. Equation (18) reduces to the 
logistic equation for the Poisson point process, in which case ^2(^1) = mi^. In aggregated spatial 
patterns, m2 exceeds mi^ for some domain. If mortality is modeled by a kernel that penalizes close 
proximity over the same range of scales where aggregation is detected, then the effect of mortality 
due to competition is stronger in this case than that of the logistic equation, in which case the 
density equilibrates below Ks (Figure [2| cases (c),(d) and (e) ). The opposite situation occurs in 
segregated patterns, where m2 is less than mi^ at the scales where the mortality kernel penalizes 
aggregation. As a result, the effect of competition on mortality is milder than in a random spatial 
pattern, in which case the mean density equilibrates at values greater than Ks (Figure [2| case (a)). 



Equation (18) depends on the unknown second order density m2. A similar procedure to that used 



in the derivation of (18) one obtains the evolution equation for this quantity 
1 d 



: b[ B{i2)m2{ii-i2,t)d^2 + bB{ii)mi{t)-dm2{ii,t) 
- dNW{ii)m2{iut)-dN f W{i2)m^{ii,i2,t)di2- (19) 

JM2 

Here the role of dispersal and competition kernels as the main pattern drivers can be clearly 



discerned [H O [121 SI] • The first two terms in (19), related to fecundity and dispersal, are 



' / B{i2)m2{ii-i2]t)di2 + bB{ii)mi{t). 
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Figure 3: Schematic representation of a spatially stationary triplet configuration. The pair densities 
are evaluated at each inter-event (vectorial) distances 6) 6 and 6 ~ 6 

Both are nonnegative by definition for all values of 6 and t. The rate of change of m2 increases 
due to their effect, and thus they drive aggregation at the scales controlled by the characteristic 
spatial scale of the dispersal kernel. The convolution measures the creation of pairs along 6 due 
to dispersal of the third member of the triplet along the 6 ~ 6 edge (Figure [S]). The second term 
measures the creation of pairs along the 6 edge due to the dispersal events generated the individual 
at the origin of 6- The remaining terms due to mortality are, 



All the three terms are negative, and thus contribute to the destruction of pairs along the 6 edge, 
leading to segregated patterns. The first term measures intrinsic mortality of both members of 
the pair and the remaining ones are related to density-dependent mortality. The second, measures 
mortality of the pairs due to competition at the scales controlled by the mortality kernel. The last 
term measures the destruction of the pair along the 6 edge due to the effect of competition with 
the additional member of the triplet located along the 6 edge. 

These terms for both dispersal and mortality are initially calculated by fixing the count at 
the origin of 6 and let the count at the end of 6 vary according to the fecundity, dispersal and 
mortality terms. Symmetry considerations require consideration of the reverse situation, where the 
count at the end of 6 is fixed, and the origin is allowed to vary. Since these are symmetric, these 
additional terms lead to the factor of 1 /2 on the left hand side of the equation for the second order 
product density. 
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4 Moment closure by Shannon entropy maximisation 



The product density equations ( |18| ) and ( 19 ) cannot be solved in that form because the evolution 
equation for the second order density has a mortality term that depends on a weighted average 
of the third order one. Although it is possible to derive an additional evolution equation for this 
quantity, it will involve an unknown fourth order term, leading to a system that is not closed. In 
general, the evolution equation for the density of order k will depend on the density of order k + 1. 
This gives rise to two problems, known together as 'a moment closure' jHHS]. The first is choosing 
an appropriate order of truncation k, and the second is finding an expression for the product den- 
sity of order k+1 in terms of the densities of orders up to k (or k+1 in the case of an implicit closure). 

Ideally, the order of the truncation should be based on an understanding of the convergence 
properties of the hierarchy in order to establish error bounds. In practice, the order of the trunca- 
tion is determined by the computational cost of the numerical solution, which is determined by the 
size of the arrays that can be stored and operated on efficiently. Explicit representation of third 
order terms already requires least 3.2 Gb of memory using double precision and a relatively coarse 
discretisation of 100 grid points per dimension. This situation pretty much constrains to three the 
highest order density that can be represented explicitly. 

From an applied perspective, the first and second order terms are of greatest interest, since 
these respectively encode the dynamics of the average density and the spatial covariance. The 
latter can be interpreted biologically as the average environment experienced by an individual as a 
function of spatial scale \4:3\ IH] . The shape of the second order correlation function can be used to 
distinguish between aggregated, random and segregated spatial patterns sharing the same average 



density (see Section 2.4). 



Closure problems are pervasive in the statistical mechanics of fluids where thermodynamic 
quantities are derived from the statistical properties of the particle distributions [69] [601 1321 ESI H?! 
141] . Here our intent is somewhat similar in the sense that a detailed individual-based model is used 
to inform a mean-field model that does not neglect the role of spatial fluctuations in density due 
to endogenously generated spatial structure structure [H [3 HI] . Within spatial ecology, moment 
closures have been proposed with varying degrees of success, using a suite of methods, among which 
we have: 

• Heuristic reasoning, where consistency arguments are used to construct closing relationships 

[Hmaisiiii]. 

• Distributional properties, where closures are based on assuming a functional form for the 
distribution of the process 



• Variational methods, where it is assumed that the unknown distribution optimizes some 
meaningful functional, usually an entropy-like object |35l |69| 

In order to make the paper reasonably self-contained, we shall briefly review closures based on 
heuristic reasoning, which have dominated work in this problem. Additional information can be 
found in a recent review by Murrell et al [51] . 
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Figure 4: Closure comparison. Panel (a) shows the mean density rhi{t) of the point process versus 
time averaged over 300 sample paths (blue) up to a simulation of 300 time units. The continuous 
black line shows the predicted mean density from the moment equations with the power-3 or 
Kirkwood closure, the dashed black line corresponds to the power 2 closure. The dash-dot line 
corresponds two the power 1 closure. Panel (b) shows the pair correlation function at time t = 300 
(blue), indicating aggregation at short scales, but segregation at intermediate ones. The black line 
corresponds to the pair correlation function predicted by the solution of the moment hierarchy with 
the power~3 closure, and the dashed line corresponds to the power 2. 

4.1 Heuristic methods of moment closure 

Heuristic closures are usually based on self-consistency arguments. For instance, they should be 
strictly positive and invariant under permutations of the arguments |21| , I1 H [T^. Also, if correlations 
are assumed to decay monotonically with distance, then there is a distance d beyond which the 
particles become uncorrelated and thus higher order densities become simple powers of the mean 
density. Although a large number of functional forms can be chosen in order to satisfy these 
minimum requirements, the simplest ones usually involve additive combinations of various powers 
of the second and first moments. For instance, if one further assumes that central third moments 
vanish, the resulting expansion in terms of product densities, leads to the power-1 closure, dubbed 
that way because the highest occurring power of the second order density is one [H O [TJ [19] , 

"1-3(6,6) = mi 771-2(6) + 1^1 ^2(6) + mi 7712(6 - 6) - 2 7771^. (20) 

This closure has the attractive property of preserving the linearity of the moment hierarchy, which 
allows the derivation of analytical results at equlibrium [H |5] • It is quite successful at low densities 
(7771* ~ 20) and 1-dimensional systems. However, at intermediate to high densities (7771 ~> 100) 
aggregated patterns, this closure predicts extinction in situations where the point process persists 
(see dash-dot line in panel (a) in Figure |4]), even for mild correlation regimes. It is nonetheless a 
useful benchmark result. 
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The power-2 closure is obtained as a continuous space analogue to the pair approximation used 
in discrete spatial systems |61j . 

"13(41,^2) = \ \ 2mi ; (21) 

nil "T-i 777-1 

this closure does predict a persisting population. However, it underestimates quite strongly the 
second order density, which leads to overshooting the mean density (see panel (b) in Figure |4| 
dashed black line). It is non-linear and thus solutions have to be obtained numerically. There are 



asymmetric versions of this closure that consist of rescaling each additive term in (21) with a set 
of weighting constants [44 ^ [HT] . Law et al 03] showed that a particular combination of weighting 
constants provides a very good fit to simulations. However, this result is difficult to generalize as 
there is no theory informing how these constants are chosen, since they depend on the details of 
the model |51] . and can only be found by comparisons with simulations of the IBM. 



Finally, the power-3 or Kirkwood closure (23) has a distinguished tradition in the statistical 



mechanics of fluids |4H I41j . Recently, Singer [69j showed that this closure can be obtained in 
the hydrodynamic limit after invoking a maximum entropy principle to truncate the BBGKY 
hierarchy. Earlier motivations for this closure were based on the assumption that each of the pair 
correlation functions associated with the three edges of the triplet configuration (see Fig. [3]) occurs 
independently of each other at all spatial scales, 

93{xi,X2,Xs) = g2{xi,X2) g2{xi,X3) g2{x2,X3). (22) 

Substituting the definition of the k-th correlation function in terms of the product densities (|16|) 



into (22) for = 3 yields a version of the Kirkwood closure (22) that can be used to close the 



equation at second order (19) 



t N "^2(6)"T-2(6)"T-2(6 -6) 

"13(6,6) = 5 • (23 

TTJ-i'^ 

This closure also underestimates the second order density, but less dramatically so than the power- 
2 closure, which results in a slightly better prediction of the mean density (see panel (b) in Figure 
[4]). Despite its appealing simplicity, the power-3 closure shares the same limitations of the other 
heuristic closures, e.g. there is no criterion of validity, and it provides poor fit to the equilibrium 
density even for mildly aggregated patterns [SH] [IH]- Heuristic closures have reasonably good 
performance in random and segregated spatial configurations, but are significantly more limited 
in aggregated regimes, with the sole exception of the asymmetric power-2. Their limitation arises 
from the implicit assumption that there are no irreducible triplet correlations at any scale, in the 
sense that after fixing a pair that forms an edge, for instance the points xi and X2 (see Fig[3|, 
the two other edges of the triplet formed with the third point 2:3 occur independently of how the 
first edge is chosen. This can only be true when the three points are sufficiently far apart, but 
irreducible third order correlations are likely to occur when the three points are close together in 
aggregated patterns (Figure |6]). 

4.2 The Maxent closure 

The concept of entropy from an information theoretic point of view, as opposed to the thermo- 
dynamical definition of entropy, is tightly related to the uncertainty (or information content) as- 



18 



sociated with an outcome of a random variable. It can be shown that the information content 
of a particular outcome {x' + dx') of random variable x with probability density p{x), is given 
by Iog[p(2;')dx'] ^Ml HOl- The entropy functional is constructed by taking the expected value of 
the information content over all the possible outcomes of x [66l ESI SO] • To illustrate what this 
means, consider the uniform distribution on an interval [a,b] G M^. It is not surprising that this 
distribution maximizes the entropy functional if no constraints are introduced, since all the values 
in its domain of definition have the same probability weight, thus the uncertainty about a specific 
outcome of a random variable with this distribution is maximal. The opposite situation occurs 
for the Dirac delta distribution which is centered on one single value, say x'. In this situation, 
a single value occurs with probability one, and all the others have probability zero, therefore the 
uncertainty about an outcome of this (pathological) random variable is null. 

The principle of maximum entropy is a powerful method that allows the derivation of prob- 
ability distributions when only but a few average properties are all that is known. Maximizing 
the entropy functional subject to the constraints provided by these averages, leads to probabil- 
ity distributions that have the least bias with respect to the known information \38\ \39\ [66l HQ] . 
For instance, maximisation of the entropy constrained to satisfy normalisation and a given mean 
value leads to the exponential density. Likewise, maximizing the entropy constrained to satisfy 
normalisation for a given mean and variance leads to the Gaussian density. For point processes 
\46\ [16] the entropy is defined with respect to some spatial window of observation A, and has two 
sources of uncertainty, the first is related to the counts within A, and the second is related to the 
locations of the n points inside this window. Truncating the hierarchy at order two assumes that 
only configurations involving up to three points possess irreducible spatial information. We carry 
that assumption forward onto the locational component of the full point process entropy functional, 
which we then maximise subject to the constraints of normalisation and product densities up to 
order two, which are given by the truncated hierarchy. We exploit formal relationships between the 
product densities and the probabilistic objects used to construct the entropy functional of a point 
process — the Janossy densities — that allow the incorporation of the product density constraints 
onto the entropy functional, and then translate the results of the maximisation procedure in terms 
of product densities in order to obtain a closure expression. 

Our result differs from other maxent closures, like those of Singer [69j and Hillen ^35j, in a 
number of ways. First, it is implicit, in the sense that the third order density appears in both 
sides of the closing expression for truncation at second order. We do so because the Kirkwood 
closure arises naturally from independence considerations |69) for spatial scales larger than the 
minimum distance for which the pair correlation function is not constant, but it is not valid within 
the domain of irreducible triplet correlations, i.e. the probability of observing a third point in the 
triplet depends on how the first two are chosen. If improvements to the Kirkwood closure are to be 
made, irreducible triplet correlations must appear in the closure. In the maxent method we propose 
irreducible third order correlations are generated by iteration of the closure relationship, while the 
first and second order densities, generated by the hierarchy, are held fixed. Second, we assume 
that these irreducible third order correlations are confined to a finite window, or spatial scale 
which is found by comparison of the normalisation condition for the correlated process with that 
of a Poisson process of the same mean density. Third, in contrast to other existing approaches, 
we used all the moments up to the order of the truncation (including the zeroth) to constrain the 
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entropy functional. This is critically important because the zero-th moment is associated with the 
normalisation constraint, which allows the determination of the domain of triplet correlations. 



The variational problem is formulated in terms of the locational entropy functional of the 
marginal spatial point process. In order to introduce the product densities as constraints, we 
exploit known expansions of these in terms of the Janossy densities |14l [57] that constitute the 
probabilistic objects (the likelihoods) required to construct the entropy functional. Whereas Singer 
[69] used the k-th order product density to constrain an entropy functional, and Hillen [35], used 
an L^-norm of the moment hierarchy for this purpose, we used instead the classical definition of 
the entropy functional for a point process, based on the full battery of Janossy densities [Ml [16] . 

The implicit, order two maxent closure ^ resembles the structure of the power-3 or Kirkwood 



closure (23), but is complemented by a number of correction terms that depend on averages of 
the product densities for each scale at which triplets are irreducible. Outside this domain, these 
correction terms vanish and the closure becomes identical to the power-3. There are two scales of 
relevance in the closure, one where irreducible triplet correlations are important, and another one 
where these can be expressed in terms of second and first orders only. 

For the sake of completeness, we first discuss known results related to the entropy of spatial 
point processes in subsection |4.3[ and the key expansions of Janossy densities in terms of product 
densities. This is followed by the derivation of the implicit maxent closure for truncation at order 



two (41). 



4.3 The entropy of a point process 

The Shannon (or information) entropy H[V] of a stochastic process V, interpreted as the average 
uncertainty (or information content) associated with a given outcome of "P, is defined as minus the 
expected value of the log-likehhood L ^il [M [55115^ [iUl [HB] . 

H[V] = -E{\og{L)}. (24) 



The specialisation of the entropy ( 24 ) to point processes requires a special form of the likelihood, 
given that in a realisation of a point process of the form {xi, . . . , Xn} in a window A there are two 
sources of uncertainty. The first comes from uncertainty about the number of points n within A 
(the counts), which is controlled by an integer- valued probability distribution pn = Pr{N{A) = n}. 
Conditionally on the value of n, the other contribution comes from the uncertainty associated with 
the locations of the n points, which is given by a symmetric (in the sense of invariance under 
permutations of the indices) probability density Sn{xi, . . . , Xn \A) on Thus, the likehhood of a 

spatial point process is the probability of finding n points within A, each in one of the infinitesimal 
locations dxi, . . . ,dxn and nowhere else within A. This coincides with the definition of the local 
Janossy density [H ] [TB I l37j 

La{xi, ...,Xn)= PnSnixi, ■ ■ ■ , Xn\A) = j„(xi, . . .,Xn\A). (25) 

Separating the contributions due to the counts and those due to spatial information, we can repre- 
sent the entropy of a point process Ma on a window A by [14] [TB] 

oo oo „ 

H[J\fA] = -y^Pr'^Og{r\pr) -y^Pr / Sr{xi, . . . Xr) log[Srixi, . . . Xr)] dxi ■ ■ ■ dXr, (26) 
r=0 r=l 
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where the integrals calculate the contribution due to the locations, an the sums that of the counts. 
If we fix the expected number of points in A, fj, = mi \ A\ = 'E[N{A)], it can be shown that the first 



sum in ( 26 ) is maximized by the Poisson distribution \16\ HOl H6] , 



Pr = — -exp(-^). 

Conditional on the counts r, the second sum is maximized by the uniform density on A^'^^ 

1 



Thus, the point process of maximum entropy is the homogeneous Poisson point process with first 
order density mi |15^ [T6] . For closure purposes we use the definition ( 24 ) written in terms of the 
local Janossy densities 

oo ^ 

HIMa] = -y^^—j jnixi,. . . ,Xn\A) log[jn{xi,. . . ,Xn\A)]dxi- ■ -dXn, (27) 

n=0 ''^ 

where division by n\ ensures normalisation with respect to the n! permutations of the n indices. 



Our method of closure consists of maximizing (27) constrained to satisfy the product densities up 



to the order of truncation. These can only be meaningfully incorporated as constraints if they 
can be expressed in terms of integrals over A of the Janossy densities. We do this by using the 
expansion [H], 

m„(xi,...,x„) = V]— / jg+n{xi,...,Xg,yi,...,yn)dyi...dyn, (28) 
where the inverse relationship, 

°° (—1)1 r 

jn{xi,...,Xn\A) = y~] — / mn+q{xi,...,Xn,yi,..-,yq)dyi...dyq, (29) 

,= 

can be used to translate the results of the constrained optimisation procedure in terms of product 
densities in order to yield a closure for the product density hierarchy. 



4.4 Maximum entropy closure at order k = 2 

In the case of the non-homogeneous Poisson point process, which maximizes the entropy functional 



(27), all the points can in principle depend on the specific locations, but these are uncorrelated. 



For this special case the expansion of the likelihoods in terms of the product densities ( 29 ) takes 
the simplified form. 



Jn \Xl , • . . , Xj- 



1^) = ll'^i{xp)Y.^—^llmi{yi)\A\K (30) 



p=l q=0 ^ 1=0 
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Figure 5: Estimated radial pair correlation functions at equilibrium 52(^) from simulations of the 
point process in Section 2.3 with dispersal and mortality kernels given by symmetric bivariate 
Gaussians. Parameters lead to a mildly aggregated pattern (case b, dashed line) and a segregated 
pattern of clusters (case a, continuous line). In (b) we note that correlations decay quickly and 
become constant at a spatial lag r > 0.2, whereas in (a) there are distinct patterns in at least two 
spatial scales. Aggregation in the smaller ones, and segregation at intermediate ones. 



If the process is a spatially stationary and homogeneous Poisson point process, then all the product 
densities become simple powers of the mean density |2H I14j. which further simplifies (29) to, 



jnixi 



\A) = mi" exp(-mi|A|). 



Thus the probability of observing n points within a window A is 

Pr [N{A) = n] = ^ / j„(xi, . . . ,Xn\A) dxi ■ ■ ■ dxn, 



(31) 



(32) 



which after substituting (31) into (32) leads to the Poisson distribution 



Pr [N{A) 



n 



(mil^l)" exp(-mi |^|) 
^1 ■ 



We assume somewhat crudely that the Janossy expansions of the point process associated with the 



moment hierarchy have an intermediate structure between the two extreme cases (29) where the 



spatial configurations of all orders are irreducible, and the Poisson point process ([31 ) where all the 
locations occur independently. This assumption can be justified from the truncation assumption, 
since truncating the hierarchy at order two implicitly assumes that terms of order equal or higher 
than four do not contribute to the formation of second and third order spatial correlations. Also we 
see in estimates of the pair correlation functions for the point process discussed in Section [2| shown 
in Figure ^ that there is a region in the parameters for which the spatial correlations of second 
order decay quickly. Case (a) corresponds to segregated clusters and thus the pair correlation 
oscillates around one. There are two different scales with pattern there. One associated with the 
clusters (the region where g'2 > 1) and another with the separation between the clusters themselves 
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(52 < !)• Case (b) on the other hand corresponds to a simply aggregated pattern. In this latter 
case we see clearly that there is a spatial scale for which the pair correlation function becomes 
constant and identical to one, therefore 

for some spatial scale rg. This assumption is tantamount to requiring that the Janossy expansions 
of the process to have the form, 



Jn 



k+l-n 

E 

q=0 
n 



oo 



+ JJmi(a;p) ^ 



m„+g(xi, ...,Xn,yi,...,yq)dyi... dyg 
(-1)" 



q> k+l—n 



AM 



Y\_mi{yr)dyr 



(33) 



r=l 



where the first term corresponds to the terms that make contributions due to spatial correlations, 
and the second term is the (non- homogeneous) Poisson remainder. For k = 2, equation (34) 
becomes 



jni^li • • • J X, 



run+qixi, ...,Xn,yi,..-,yq)dyi... dyg 



+ \\mi{xp) V — / \\mi{yr)dyr. 

p=l q> 3— n r=l 



(34) 



The closure assumption implies that only the Janossy densities of order up to /c + 1 make contri- 
butions to the locational entropy, in which case the entropy functional (27) becomes 

3 



-MA)\og[MA)]-J2 E 



n=l l<ii<...<j„<3 



(3-ra)! 
3! 



(35) 



/ jn{Xi^,---,Xi„\A) log[jniXi^, ■ . . ,XiJA)]dXi^ ■■■ 



dxj 



where Jq{A) is the avoidance probability in A. The first constraint added to (36) is that of nor- 
malisation, 



n=0 



jn (2^1 ) • • • ) Xn) dx\ ■ ■ • dx-rii 



which after simplification with the assumption (34) can be added to the entropy functional 

3 



+K,.\uA) + Y, E 

q=l l<ji<...<j5<3 



(3-g)! 
3! 



A(9) 

00 »t 00 , I 



jq{,Xi^i • • • , Xi^ \ A^ dxi^ . . . dxi 

i-iy 



-En^^i^^*) E "^pll / ^ mi{yr)dyr-l 

n>3i=l l>3-n ' r=l / 
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where Aq is a (constant) Lagrange multiplier. The second constraint is that of the first order 
product density mi{xi) 



l<il<3 \q=0 l<ii<...<: 



(3-g)! 
3! 

...<jq<3 



AM 



(36) 



where Ai(xi^) is a vector of functional Lagrange multipliers, each associated with the permutations 
in the locations xi,X2 and X3 comprising the triplet. Finally, the constraint for the second order 
product density m2{xi^,Xi^) is 



+ 



l<n<i2<3 ^ \q=01<h<...< 



(3 



3! 



..<ig<3 

j2+q[Xi^,-- .,Xi„,yi^,.. .,yi^ \ A)dyi^ . ..dyi^ 

- m2{xi^,Xi^) dxi^dxi^. (37) 

Likewise, the A2(xij, Xi^) are the Lagrange multipliers associated with each of the permutations of 
the pairs in the triplet. The Euler-Lagrange equations of the functional (36)-(37) are 



6 MA) 

Sj2{Xi-,,Xi2) 
Sj3{xi,X2,X3) 



- 1 - log[Jo(^)] + Ao = 0, 



^(1 + logji [{xi,)]) + ^Ao + ^Ai(xiJ = 0, 



Ku <3 



hl + log[j2ixi^,Xi2)]) + ^Ao + ^Ai(xiJ + ^A2(xii,Xi2) = 0, 1 < ii < ^2 < 3 
fa fa 3 fa 



- ^(l + log[j3(a;i,a:2,a;3)]) + Jao + [Ai(xi) +Ai(x2) 
fa fa 2 

+ Ai(x3)] + ^ [A2(X1, X2) + A2(X2, X3) + A2(2;i, Xs)] = 0. 



(38) 



It can be seen by inspection that each of the second variations is inversely proportional to minus the 
Janossy density of order k. Since these are all probability densities, each of the second variations 
is negative and thus the extrema given in the first variation (38) are maxima. Solving the Euler- 
Lagrange equations ( 38 ) for the Lagrange multipliers yields 

Ao = 1 + log[Jo(yl)] 

jiixiY 



Ai(xi) = log 
Ai(x2) = log 



MA) 

jl{x2) 

MA) 
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Ai(a;3) 


1 

= lo 


A2(xi,X2) 


= lo 


A2(X2,X3) 


= lo 


A2(a:i,X3) 


= lo 



MA)h{xi,X2) 

Jo(^) j2(x2,a:3) 



(39) 



After substituting the Lagrange multipliers in (39) into the equation for the first variation with 



respect to js in ( 38 ) yields an expression that relates the Janossy density of third order to the lower 



order ones under the assumption of maximum entropy constrained by the moments, namely 

J2 (Xl , X2 1 ^) j2 (X2 , X3 1 ^) ^2 (xi , X3 1 A) 



h{xi,X2,Xz\A) 



jl{xi\A)jl {X2 1 A )ji (X3 1 A) 



UA), 



(40) 



Equation (40) is formally similar to the Kirkwood closure. However, there are a number of im- 
portant differences. First, it varies with the choice of the window A, since it depends on the local 
likelihoods (see Figure [g]) rather than the product densities used in the Kirkwood closure, which 
are global properties that do not depend on the window of observation. This domain A depends 
on the spatial scale for which the third particle in the triplet becomes independent of the other 
two. Second, the closure is weighted by the avoidance probability Jo{A). This term is conceptually 
similar to the exponential weight suggested by Meeron [47j and Salpeter |5D], but now arises from 



a maximum entropy consideration. The relationship (40) can be used as a closure of the moment 



hierarchy after using the expansions ( 29 ) and ( 34 ) that allow the Janossy densities to be expressed 
in terms of product densities. 

Since the underlying point process is spatially stationary by construction, then the mean density 
is constant, and the densities of higher orders depend on the relative rather than absolute distances 
between points. After rescaling the product densities in the expansion by the area of the window A 
(the product densities that come from the hierarchy are defined in terms of the much larger spatial 
window used to observe the full process) we have that the maxent closure is given by 



if 161 < "Tq and |6| < and |6 - 61 < '^o 



"i3(6,6) 



"^2(6) - |-4o| / ?T13(6,?2) d^2 

JAo 

"^2(6)-|^ol / m3(6,6-Cl)dei 
^2(6-6)- l^ol / m3(6-el,el)del 

JAn 



(41) 



Jo(^o) 



mi - l^ol "^2(61) C + ^ ^4(2) m3(ei, e^) dC^ 



else 



"13(6,6) 



"T-2(6) ?"2(6) y»2(6 - 6) 



(42) 
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Figure 6: The domain A represents the region beyond which a third particle becomes independent 
of the other two. Shifting x'^ to X3, makes that third point independent of the other two, in which 
case the triplet requires only information about second and first orders density, since the two points 
along the ^1 edge are still correlated. This corresponds to the spatial scale for which the assumptions 
leading to the Kirkwood closure are valid. 
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where the circular domain of radius tq is determined from the normahsation constraint (described 
below). The avoidance function Jq{Ao) is given by 



MAo) 



1 -mi|Ao| + 



Ao 



(2) 



71=4 



(43) 



and the summation term is equal to 



(mil^ol)" = exp (-mil^ol) - 1 + "1i|^g 



(mil^ol) , {rni\Ao\) 



+ 



n=4 

After simplifying we have 

Jo(Ao) = exp(-mil^ol) + 



6 



Ao 



{mi\Aa 



(2) 



"^3(6,6)^6^6(44) 



+ 



{mi\AQ\f 
6 



In order to obtain the family of sets Aq in the correction terms of the closure, we first need to 
identify the spatial scale rg beyond which two points become independent. This is equivalent to 
finding the smallest region for which the correlated point process has the same statistics of a 
Poisson process of the same mean density. This domain is obtained by comparing the avoidance 
functions for each case, which must coincide for this specific set. Since the avoidance probability 
for a homogeneous Poisson point process of intensity mi for some reference window B is equal p!^ 
to 

Jo*(S) =exp(-mi|S|), 

Thus the set Aq must satisfy 

Jo(^o) = Jo(^o). 



(45) 
(46) 



Substituting the rhs of ( 45 ) and ( 45 ) into (|46j) leads to the integral equation 
j m2{ii)dii - ml\Ar\ - ^ / m3(6,6)(^6c?6 + 



(47) 



where A^- = B{0, r) is the ball of radius r centered at the origin. Since all the product densities are 
given by the hierarchy and the closure relationship (41), the only unknown in (47) is the domain 
Aq that satisfies the equality ( 47 ) . This can be found by evaluating the rhs of ( 47 ) for an increasing 
family of domains A^. The values of for r that satisfy the equality are the roots of interest. There 
are four possible scenarios for these roots: 



1. The trivial root, r = is the only solution. This is always a solution by simple inspection. 

2. A single non-trivial root r* . 

3. A finite number of n non trivial roots r*, rg, . . . , r* . 
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4. An infinite number of roots. 

A criterion of validity for the closure scheme can be built on the basis of the number of roots. 
Case [T] indicates that there is not a scale within the observed range of r for which correlations 
decay as powers of the mean density, and thus truncation should be tried at a higher order. Case [2] 
indicates that there is a single Poisson domain and thus the closure assumptions are consistent 
with the predicted values of the hierarchy. Case [3] indicates that there are several scales of spatial 
pattern, due to correlations that oscillate as they decay, i.e. segregated clusters (see Figure [5]). In 
this situation each scale of pattern should be treated separately. An infinite number of roots (case 
[4]) indicates that the process is indistinguishable from a Poisson process at all scales. 

Although the closure expression seems complicated, we note that if the area oq = |^o| is small, 
then the integral correction terms are of similar magnitude, and relatively small in comparison 
with the correction introduced by avoidance probability, which by far dominates the closure. In 
this situation we have a much simpler approximation to the exact closure, given by 

rc t N _ "i2(6)"i2(6)"T-2(6 -6) / I . |x /.gx 

"^3(6,6) ~ 5 exp(-mi^o )• (48) 

7T) 1 



5 Numerical implementation 

The numerical solution of the hierarchy with the maxent closure requires two separate modules of 
code: one for the integration of the hierarchy itself, and the other for the iterative procedure that 
computes the third order density. The first, which we call the 'outer' code, consists of a standard 
numerical integration scheme that predicts the first and second order product densities at a time 
{t + h) using the first, second and third order ones at time t as input, where /i is a small time 
step. The second module, or 'inner code', computes the third order density at time (t + h) from the 
maxent closure. The inner code starts by computing an initial value for the area of normalisation 
A^^'^^ using the values of the first and second order densities at time {t -\- h), and the third order 
density at time t as an initial trial. This first value ^q"''^^ is then substituted in the maxent closure 



expression (41) to produce an updated value for the third order density. The area of normalisation 



is recalculated with the updated third order density to produce a new value Aq"^"'^; if the relative 
difference between the old and the new radii associated with each normalisation area falls below 
some pre-specified tolerance, then the iteration stops and the final value of the third order density 
at time {t + h) is the one being used to calculate the last iteration of area of normalisation. If 
not, the iterations continue until the tolerance is achieved. We now propose an algorithm for the 
implementation the maxent closure, and subsequently show its performance for a broad range of 
parameters of the spatial scales. Our numerical results are well behaved and convergence of the 
iteration scheme occurs rapidly for a sufficiently small time step {h = 0.1), where typically two or 
three iterations of the closure are sufficient for a relative error tolerance within one percent. The 
problem consists of solving the coupled system 

imi{t) = rmi{t)-dNjj^W{Ci)m2{Ci,t)d^i 
\ft^2{ii,t) = hj^B{i2)m2{ii-i2,t)di2 + hB{ii)mi{t)-dm2{iut) (49) 



- dN W{ii) m2(ei, t) - dN /r W{i2) ^3(6, 6, t) d^2- 
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where F C is the computational window. The initial condition 

mi(0)=no, m2(6,0)=ng, m3(6, 6, 0) = nj]. 

The window T should be large enough to approximate correctly the integral terms so that the scale 
for which the second and third product densities respectively decay to and mf lie well within 
the computational windowF. This hierarchy can be closed at order 2 with the maxent closure (41) 

1712 (6)- l^ol / ^3(6,^2)^^2 

JAo J 

m2(6)-|viol / m3(6,6-e'i)dei 

JAo 

: m2(6-6)- l^ol / m3(6-el,ClXi 



"13(6,6) 



(50) 



Ao 



MAo) 



which is applied if each the three distance vectors (6,6 and 6 ~ 6, see Figure [g]) connecting the 
three points in the triple configuration fall within the normalisation domain Aq. Outside of this 
region we apply the Kirkwood closure on the basis of probabilistic independence of the third point 
in the triplet, as discussed in the previous section 



"13(6,6) 



"12(6) "12(6) "12(6 - 6 



(51) 



In the maxent closure (50) the avoidance function Jq{Aq) is given by 

Jo(^o) = exp (-mil^ol) • 

The circular domain Aq is computed from the comparison between the normalisation constraint for 
the truncated hierarchy and that of a Poisson process of the same mean intensity. It is calculated 
by finding the value of r that satisfies 



/ 

J A, 



"l2(e'lX -"1?|A 



1 



^,,"l3(Cl,C^)d^We2+"'^ 



where A^ is the 2-dimensional ball of radius r centred at the origin 



0. 



(52) 



5.1 Algorithm for the numerical implementation 

The coupled system of product density equations with the maxent closure can be solved from the 
following algorithm: 

1. From a sequence of radii = 0, . . . , r^ax, construct an increasing family of domains Ar^. 
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2. At time t = the initial configuration is given by a homogeneous Poisson point process, thus 
ah the product densities are powers of the mean density No/\X\, where X is the computa- 
tional spatial arena, and A'^o is the population size at time t = 0. 



3. While the elapsed time t < T^ax do 

(a) Integrate forward the densities mi(t + h) and m2{£,i,t + h) from the hierarchy using a 
standard numerical procedure. 

(b) Use the value of the triplet density at the earlier time step ms^°'-'^\S^i,^2,t) as the initial 
guess in the normalisation condition for the Poisson area Aq. Generate a sequence of 



values /(rj) by calculating the the normalisation condition (52) for each the domains 
previously constructed in Step [T] according to 



+ -mi^{t + h)a. 



(53) 



where the a^. are the areas for each of the Ar 



(c) Find the largest value Tq that satisfies f{ro) = by linear interpolation between the 
consecutive where /(r^) changes sign. 



(d) Use To from Step 3c to generate the estimate of the Poisson domain Aq = Ar^ . 



(e) Loop the spatial arguments and ^2 over the computational spatial arena. 

(f) Compute the magnitudes di, d2 and ds of the the distance vectors ^1, ^2 and .^2 — S,i 

(g) if di < ro and d2 < tq and d^ < tq apply the maxent closure 



exp(-mi \Ao\) 



mi - Ao m2(ei) d^i + f /ij rn3(°'<^)(el, < dC 
Jao J 

JAo J 

^2(6 - 6) -Ao [ m3("'^)(e2 - e'l, e'l) 
J An 
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(h) else use the Kirkwood closure 

m3("^-)(6,6) = (54) 

mi'' 

(i) Recompute the Poisson domain A^^^^ and its radius r^^^^ by inserting the corrected 



triplet density m^^'^^'^^ from Step 3e into the normalisation equation into Step 3c and 
estimate a new root r„. 

(j) If the difference between the old radius and the new one falls within the error tolerance 

^(new) 

< tolerance 



then the third order density at time {t + h) is the one calculated at Step 3e 

else the old third order density becomes the new third order density 

^3 ^ rn-i^oid) 



and repeat Steps 3c through pl^ until the error falls within the tolerance. 



4. update the elapsed time 

t^t + h. 

5.2 Closure performance 



We applied the simulation algorithm introduced in the previous Section 5.1 using a spatial dis- 
cretisation of 47 points per linear dimension, and the domain B was the unit square [—1/2, 1/2] x 
[—1/2, 1/2]. The spatial integrals were computed using the trapezoidal rule, and the convolution 



in (49) was calculated using the fast Fourier transform. For the solution of the moment hierarchy 
we use a fourth-order Runge-Kutta scheme (with a time step h = 0.1). Convergence was checked 
by halving the time step and the spatial discretisation and no significant differences were found ( 
m\ = 168.6, Ax = 1/47, /i = 0.1 and m\ = 168.9, Ax = 1/95, = 0.05, for aw = crB = 0.05). 

The maxent closure is expected work well in situations where the spatial scales of dispersal and 
mortality are similar, since this combination of parameters tends to produce a single scale of spatial 
pattern of mild aggregation (see Figure [s]) , where higher order terms are small. Figure [7] compares 
the dynamics of the mean density predicted by the maxent closure in a mildly aggregated regime 
[as = CFw = 0.05 ) against averages of the point process model and the other closure methods 
used in the literature, power-1, power-2 and power-3 (but the asymmetric power-2 is not used in 
the comparison). We see that the maxent closure outperforms the other closures. As before, in all 
cases the transient is predicted poorly. This is to be expected of the maxent method, because the 
locational entropy can be assumed to be maximised only once the stochastic process has reached 
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Figure 7: Comparison between the mean density (jagged blue line) for a sample of 300 simulations 
of the point process for the mildly aggregated case as = 0-05, aw = 0.05 (open circles) and 
the truncated product density hierarchy using various closures. The the maximum entropy closure 
(maxent) (continuous black line), the power-3 (dash-dot), the symmetric power-2 (dot) and power- 
1 (dashed). The maximum entropy closure provides the best fit to the equilibrium values of the 
IBM. However the performance of all the closures is poor during the transient regime. 

its stationary distribution. For this reason, even with the correction terms, the truncated hierarchy 
with the maxent closure fails at capturing the transient behavior, which typically consists of long 
range spatial correlations that decay only once the density-dependent mortality term is large enough 
to cause mixing at longer scales, thus producing a shorter correlation scale. 

The ability of the maxent closure to predict accurately the mean density changes dramatically 
when the two interaction kernels have very different characteristic scales. This combination of 
parameters leads to several scales of pattern, that can consist of short range aggregation compen- 
sated by long range segregation, or short scale segregation compensated by long range clustering. 
This occurs because the total number of pairs over sufficiently long ranges must be equal to the 
density squared. Thus, extreme aggregation over short scales must be compensated by segregation 
over the longer scales in order to preserve the total number of pairs. When dispersal has a much 
shorter characteristic scale than that of density-dependent mortality, the resulting pattern consists 
of segregated clusters. This situation violates the closure assumptions (that require a single scale of 
pattern), and we expect the validity checks in the maxent closure to be activated in this situation. 
This is illustrated for two types of aggregated patterns in Figure [Sj The upper three panels corre- 
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Figure 8: Behavior of the area of corrections in the maxent closure for two types of agregated 
spatial patterns. The upper three panels correspond to a segregated pattern of clusters with 
cjb = 0.02, a\y = 0.12, and the lower panels to a mildly aggregated pattern with as = cfw = 0.04. 
The left column shows a single point pattern at the end of the simulation, the middle column shows 
a kernel density estimate of the pair correlation function for the pattern displayed in the left and 
the right column shows the temporal behavior of the area of the set in the correction terms. 
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spond to segregated clusters {as = 0.02, aw = 0.12), and the lower three to the mild aggregation 
case discussed earlier (as = aw = 0.04). The left column conformed by panels (a) and (b) show 
typical point patterns obtained at the same time at which the numerical solution of the hierarchy 
stopped, t = 1.56 in (a), because of the validity check, and f = 80 in (b) which was long enough 
to reach equilibirium. The center column, consisting of panels (c) and (d), displays kernel density 
estimates of the pair correlation function for the point patterns shown to the left. We see in panel 
(c) a very high degree of aggregation at short scales followed by long range segregation. Finally, 
panels (e) and (f) show the dynamics of the area of correlations Ao^t) for both regimes. We see 
failure of the maxent closure to find a non-trivial root for Aq in panel (e) after a short transient, as 
should be expected due to the presence of various scales of pattern detected in the pair correlation 
function in panel (c). In this situation, the extreme form of 'checkerboard' aggregation requires 
truncation at a higher order. Since the pair correlation function is clearly not constant, but yet 
the normalisation constraint only finds the trivial root zero, the validity check is activated and 
the numerical solution of the hierarchy stops. By contrast, in the lower panels when the degree of 
clustering is comparatively smaller, the method succeeds in finding a single root that eventually 
reaches a single equilibrium (see panel (f)). 

We carried out a systematic exploration of the behavior of the maxent closure for a wide range 
of combinations (441 in total) of the spatial parameters falling within the range [0.02,0.12] that 
correspond to those explored earlier by Law et al [Hj , and compare the results with the predictions 
of the point process, and the product density hierarchy with the power-3 closure. This allows the 
assessment of the relative importance of the correction terms in the maxent closure. The upper 
limit in the parameter domain was chosen because for that scale [as = (^w = 0.12) there is only 
a very small departure from complete spatial randomness. Figure [9] shows various equilibrium 
values predicted by the product density hierarchy with the maxent closure. Panel (a) corresponds 
to the mean density, panel (b) shows the equilibrium value of the second moment at the origin, 
normalized by the mean density squared, and finally, panel (c) shows the area of normalisation at 
equilibrium. The removed regions (white) in panel (a) result from the application of the validity 
check of normalisation, since for this parameter the area of correlations is zero (see panel (c)), but 
the second order product density indicates the existence of spatial pattern. 



In Figure 10 we compare the mean equilibrium density predicted from an average of the the 
space-time point process (a), the maxent closure (b), and the power~3 closure (c). The maxent 
closure is not a good predictor of the mean density for intermediate to low ranges of mortality 
combined with long range dispersal; in this regime both the qualitative and quantitative behavior 
of the closure is poor. We see a sharp drop in the values of the mean density, whereas in the point 
process model it grows monotonically before reaching the plateau that occurs when both dispersal 
and mortality act over long scales. This combination of parameters leads to segregation at short 
scales and long range (albeit mild) aggregation. The maxent method detects only the scale of 
aggregation, which produces comparatively larger values of the area of correlations (see panel (c) 
in Figure [9]). This leads to over-correction in the maxent closure, which results in an equilibrium 
density that falls well below that predicted by the point process model. In this regime, the power- 
3 closure provides a much more precise prediction of the equilibrium density, both qualitatively 
and quantitatively. For sufficiently short ranges of dispersal together with short to intermediate 
ranges of mortality the point process model predicts extinction, as already noted earlier by [13} 
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Figure 9: Simulation results of the product density hierarchy with the maxent for various values of 
the characteristic spatial scales of dispersal aB (horizontal axis) and mortality (vertical axis). 
The left panel (a) shows the equilibrium mean density mi*. The center panel shows the value of 
the second order product density at equilibrium evaluated at the origin, normalized by the squared 
mean density. In this panel values higher than one indicate clustering at short scales, and values 
below one indicate segregation. The right panel (c) shows the value at equilibrium of the area of 
the domain used in the correction terms Aq. 



In this regime, neither the maxent closure nor the power — 3 closure is capable of predicting the 
persistance/extinction threshold, and the maxent validity check does not seem to operate either. 
However, for intermediate ranges of aggregation close or above the main diagonal {aw = ctb), the 
maxent closure does provide an improved prediction of the equilibrium density, with the added 
benefit of the criterion of validity being activated when dispersal is short range with long range 
mortality, which leads to different scales of pattern. 



We computed the relative error between the equilibrium density of the point process, and 



that predicted by the moment equations with the two closures, shown in Figure 10 Panel (a) 



corresponds to the maxent closure and panel (b) to the power-3. We see that the maxent closure 
has larger relative error than the power-3 for values located below the diagonal {ayy = (Tb), which 
are associated with segregated spatial patterns (see Figure [9| panel (b)). In contrast, the power-3 
closure performs quite well in this region. The advantage of the maxent closure becomes more 
noticeable on, and above the diagonal, which is associated with aggregated patterns. The ability 
to predict correctly the equilibrium density in this regime is nearly optimal; particularly when the 
two scales have similar magnitudes, even when both dispersal and mortality act over short ranges. 
The regions of the parameter space for which each of the two closures is relatively more useful 
are shown in Figure 12, which displays the difference in relative error between the two closures 
AE — errp^ — err maxent- Positive values of AE indicate that the error in the power-3 closure is 
larger than the maxent closure, and vice versa for negative values of AE. As discussed above the 
largest improvement of the maxent closure around to the region where the two scales are of similar 
magnitude. 
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Figure 10: Comparision of the mean density at equilibrium predicted by an ensemble average 
of the point process model (a), the maxent closure (b), and the power-3 or Kirkwood closure 
(c). In panel (b) the white region no the upper left corner corresponds to the domain where the 
normalisation constraint returns a trivial root for values of the second order product density that 
indicate the presence of spatial pattern, activating the validity check (47). 




Figure 11: Relative error of the maxent closure (a) and the power-3 closure (b). We see that the 
maxent closure performs better than the power-three closure for mildly aggregated patterns (lower 
left), but the Kirkwood closure outperforms the maxent in segregated patterns (lower right) 
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Figure 12: Difference in relative error between the maxent and power-3 closures for various 
combinations of dispersal and mortality spatial scales. Values higher than zero indicate that the 
maxent closure outperforms the power-3 closure, whereas negative values are evidence of better 
precision of the power~3 closure. 



6 Discussion 



The results of this research resonate with previous work [H I53j that demonstrates that the anal- 
ysis of stochastic, locally-regulated, individual-based models of population dynamics in continuous 
space is feasible [531131133]. The numerical implementation of the maxent closure is computationally 
more expensive (about twice as much) than existing closure methods, but is nonetheless faster than 
resorting to direct simulation of the point process; if one is willing to approximate, the simplified 
closure based solely on the exponential correction (48) is substantially simpler to implement, and 
produces very small errors in comparison with the full maxent closure. Although a number of 
moment closures have been proposed in the literature, some using heuristic arguments, and others 
based on constrained entropy maximisation, very few, if any have a criterion of validity, with the 
exception of Ovaskainen & Cornell [53] who were able to derive a series expansion for the mean 
density of a spatially explicit metapopulation problem, and show rigorously that their approxima- 
tion to the mean density is exact in the limit of long range interactions. The principal benefit of 
the maxent method lies in the fact that the normalisation constraint used to find the domain for 
the correction terms fails to find a non-trivial root when the closure assumptions are not met. This 
situation occurs when higher order terms are required in order to fully capture the dynamics, or 
when correlations extend over a range that goes beyond the window of observation. This property 
constitutes a validation check, not present in other proposed closure schemes. 



Although the power-3 or Kirkwood closure had previously been derived from maximum entropy 
arguments [69] (but using a different set of constraints and a different definition of the entropy func- 
tional), the correction terms presented here are new, and extend the probabilistic interpretation of 
the Kirkwood closure to situations where there is a region of irreducible triplet correlations. These 
correction terms introduce significant improvements in the agreement between the simulations of 
the stochastic process (for mildly aggregated patterns) and its deterministic approximation by the 
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product density hierarchy. It remains to be seen how the maxent closure behaves for other func- 
tional forms of the interaction kernels, particularly for those that have tails that decay algebraically 
( i.e. power laws) instead of exponential. Another area of further work would be related to changes 
in the value of the non-spatial carrying capacity K. For higher densities, spatial effects become 
less important. 

Since the derivation of the method does not depend on the details of the model, but only on 
that its equilibrium distribution is of maximum locational entropy with moment constraints, the 
maxent closure may be useful beyond spatial ecology where unclosed hierarchies for particle distri- 
bution functions are also commonly found, for instance in the statistical mechanics of fluids where 
the Kirkwood closure was first introduced [69j, or in problems where the organisms move in space 
[H 125^ 178]. provided that the correlation functions in those models are stationary in both space and 
time. A limitation of the method is its poor ability to predict the transient. This is to be expected, 
since maximum entropy is a meaningful property of the equilibrium distribution only when detailed 
balance is satisfied |74l [271 [38i| and the transitions due to fecundity and dispersal events coincide 
with mortality. Other areas of current and future work include the generalisation of the moment 
hierarchy and the maxent closure to an arbitrary order of truncation, extensions to marked spatial 
point processes for populations with both spatial and size structure. 

Appendix. Derivation of moment equations 

In order to derive the equation for mi{t), we start by fixing a small region of observation dxi (so 
that the count inside dxi, N(dxi) is either or 1) and write a Master equation for the probabilities 
of change in the count ANst{dxi) during a small time interval 6t, defined as 

ANstidxi) = Nt+st{dxi) - Nt{dxi). 

These come from the birth and death transitions. Births are given by the probability that the 
count N{dxi) increases by one in 6t due to a birth in dxi 

N + l, 

This probability is controlled by the fecundity rate and the dispersal kernel, 

f{xi\(pt) = P { one birth in (dxi) during (t, t + (^t) I (/3t(^)} • 

b ^ B{xi - Xn) Ntidxn)i{dxi) 6t + o{5t), (55) 

where b is the birth rate, i?(^) is the dispersal kernel, (ft is the configuration of points at time t 
and ^{A) is the area of the 2-dimensional domain A. For the death of the individual in dxi, we 
have the transition 

N -I, 

controlled by 

li{xi\(pt) = P{ death of individual xi during (t, t + 5t) \ ipt{X)} . (56) 
= Nt{dxi) 



d + dN W{xi - Xn) {Nt{dXn) - (5x1 {dXn)) 



6t + o{6t), 
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where d and are positive constants defined in Section [2| the density-independent, and density- 
dependent contributions to the mortahty and is the mortahty kernel). This probabihty is 
conditional on there being an individual in dxi. The change in the count ANst{xi) is then given 
by both contributions 

AN5t{dxi) = fixi\ipt) - /i(xi|¥?t) 



so 



ANstidxi) 



b B{xi-Xn)Nt{dxn)£idxi) 



(57) 



Nt{dxi) [d + dN ^ W{xi - Xn){Nt{dxn) - 6^^{da 



5t. 



Taking expectations (ensemble averaging) on both sides and dividing by the duration of a small 
time interval St yields 



E{ANsMxi)} ^ B{x^-x^)E{N,{dxn)}i{dx^) 



E\Nt{dxi)id + dN Y Wixi-Xn)iNtid: 



'Xn) Sxi^dXji)^ 



after rearranging the second term, dividing both sides by i{dxi) and multiplying the second sum 
by i{dxn) / i{dxn) we get 



EjANstidxi)} 
e{dxi) 6t 



EjNtjdXn)} 

i{dxi) 



B{xi — Xn) i{dxi) — d 



E{Ntidxi)} 
l{dxi) 



£{dxi)e{dxr, 



taking the limits as £{dxi) and £{dxn) go to zero, and using definition of the product density (15) 
= bmi[xi,t) / B[xi — Xn) dxi — dmi[xi,t) 



- djsf W{xi - Xn)m2{Xl,Xn,t)dXn■ 
J?^i'2 

since the process is spatially stationary by construction and exploiting the fact that the dispersal 
kernel integrates to unity, yields 



Ami(t) 
6i 



bmi{t) — dmi{t) — d^ 



[ W{^i)m2{Cut)dCi 



finally, after taking the limit as (5t — ?• we get 
d 



mi{t) = bmi{t) - dmi{t) - dN / VF(^i) 7712(6, t) d^i. 
dt J ^2 



(58) 
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On setting r = b — d, we get the the generalisation of the logistic equation to the spatial case 
obtained by Law &: Dieckmann [l3] and Law et al, lawOS, but derived explicitly in terms of product 
densities, 



'^mi{t) = rmi{t)-dN I W{Ci) m2{Ci,t) d^i- (59) 

5R2 



dt 



Since m2 is unknown, we need an additional evolution equation for this object. We follow a similar 
procedure to that used for the mean density, but considering the expected change of the product 
of the counts in two observation regions dxi and dx2- This requires the consideration of how pairs 
of points are created and destroyed as individuals disperse and die. There are three possible ways 
in which changes to occur. The first if to fix the count Nt{dxi) and allow only Nt{dx2) to change. 
The second is the reverse situation, fixing Nt{dx2) and allowing only Nt{dxi) to change. The third 
is when both Nt{dxi) and Nt{dx2) change in a small time interval. We have that 

/\[Nt{dxi){Nt{dx2)-6^,{dx2))] = Nt{dxi)^{Nt{dx2) - 5^,{dx2)) (60) 

+ {Nt{dx2) - 5^,idx2))ANt{dxi) 
+ ANt{dxi)A{Nt{dx2) - S^,{dx2)) 

where the Dirac delta distribution is used to remove self-pairs. The following derivation for the 
second order product densities is based on the symmetry in the probabilities of a birth or a death 
event occurring at both extremes of the distance vector linking xi and X2- We also assume that a 
simultaneous change in both Nt{dx2) and Nt{dxi) is negligible 

¥[ANt{dxi)A{Nt{dx2) - 5^,{dx2))] = o{5t) 

and thus the transitions of second order can be written as 

A[Nt{dxi) {Nt{dx2) - 6^,{dx2))] = 2ANt{dxi){Nt{dx2) - 6,,{dx2)). (61) 



Since we already have an expression for ANt{dxi), given by ([57|), (61) becomes 
A[Nt{dxi) {Nt{dx2) - d.,,{d^2))] = 2 ■ {Nt{dx2) - 6,,{dx2)) 



b B{xi-Xn)Nt{dXn)e{dXi) 



6t. 



(dxi) id + dN Yj ^(^1 ~ Xn){Nt{dXn) - Sx^idXn 
\ x„&ipt 



Taking expectations, and dividing by both sides by St gives 
A[Nt{dxi) {Nt(dx2) - 6,,{dx2))] 



2 6t 



= b Y B{XI - Xn)E{Nt{dXn){Nt{dX2) 

- 5,,idx2))} iidxi) - dE{Ntidxi)iNtidx2) - d,,idx2))} 

- dN Yl W{XI - Xn)E{Nt{dxi){Nt{dXn) 

- 5x^{dxn)){Nt{dx2) - 5x^{dx2))} ■ 
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After dividing by £{dxi) and i{dx2), using the definition of product densities (15) and taking the 
continuum Hmit in both space and time, one arrives at the evolution equation for the second order 
product density 



1 d_ 

2 di 



m2{iut) = b[ B{i2)m2{ii-i2,t)di2 + bB{ii)mi{t)-dm2{ii,t) 

- dNW{i{)m2{ii,t)-dN f P^(6)"i3(ei,6,t)d6, (62) 



where we see the dependence on the third order product density in the last integral 
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